Residual Algorithms: Reinforcement Learning with Function Approximation
نویسنده
چکیده
A number of reinforcement learning algorithms have been developed that are guaranteed to converge to the optimal solution when used with lookup tables. It is shown, however, that these algorithms can easily become unstable when implemented directly with a general function-approximation system, such as a sigmoidal multilayer perceptron, a radial-basisfunction system, a memory-based learning system, or even a linear function-approximation system. A new class of algorithms, residual gradient algorithms, is proposed, which perform gradient descent on the mean squared Bellman residual, guaranteeing convergence. I t is shown, however, that they may learn very slowly in some cases. A larger class of algorithms, residual algorithms, is proposed that has the guaranteed convergence of the residual gradient algorithms, yet can retain the fast learning speed of direct algorithms. In fact, both direct and residual gradient algorithms are shown to be special cases of residual algorithms, and it is shown that residual algorithms can combine the advantages of each approach. The direct, residual gradient, and residual forms of value iteration, Qlearning, and advantage learning are all presented. Theoretical analysis is given explaining the properties these algorithms have, and simulation results are given that demonstrate these properties.
منابع مشابه
A Residual Gradient Fuzzy Reinforcement Learning Algorithm for Differential Games
In this work, we propose a new fuzzy reinforcement learning algorithm for differential games that have continuous state and action spaces. The proposed algorithm uses function approximation systems whose parameters are updated differently from the updating mechanisms used in the algorithms proposed in the literature. Unlike the algorithms presented in the literature which use the direct algorit...
متن کاملAdaptive Bases for Reinforcement Learning
We consider the problem of reinforcement learning using function approximation, where the approximating basis can change dynamically while interacting with the environment. A motivation for such an approach is maximizing the value function fitness to the problem faced. Three errors are considered: approximation square error, Bellman residual, and projected Bellman residual. Algorithms under the...
متن کاملMulti-Player Residual Advantage Learning With General Function Approximation
A new algorithm, advantage learning, is presented that improves on advantage updating by requiring that a single function be learned rather than two. Furthermore, advantage learning requires only a single type of update, the learning update, while advantage updating requires two different types of updates, a learning update and a normilization update. The reinforcement learning system uses the ...
متن کاملTD(0) Converges Provably Faster than the Residual Gradient Algorithm
In Reinforcement Learning (RL) there has been some experimental evidence that the residual gradient algorithm converges slower than the TD(0) algorithm. In this paper, we use the concept of asymptotic convergence rate to prove that under certain conditions the synchronous off-policy TD(0) algorithm converges faster than the synchronous offpolicy residual gradient algorithm if the value function...
متن کاملRegularized Policy Iteration
In this paper we consider approximate policy-iteration-based reinforcement learning algorithms. In order to implement a flexible function approximation scheme we propose the use of non-parametric methods with regularization, providing a convenient way to control the complexity of the function approximator. We propose two novel regularized policy iteration algorithms by addingL-regularization to...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1995